Big Data-Enabled Internet of Things by Khan Muhammad Usman Shahid; Khan Samee U.; Zomaya Albert Y

Big Data-Enabled Internet of Things by Khan Muhammad Usman Shahid; Khan Samee U.; Zomaya Albert Y

Author:Khan, Muhammad Usman Shahid; Khan, Samee U.; Zomaya, Albert Y.
Language: eng
Format: epub
Publisher: Institution of Engineering & Technology
Published: 2020-04-29T16:00:00+00:00


11.5.1.2 Hierarchical-clustering methods

Methods that hierarchically decompose entire datasets are known as hierarchical-clustering methods. The categorization of hierarchical-clustering methods depends on if the dendrogram generated a cluster using either agglomerative or divisive methods. In agglomerative methods, each point starts as a distinct cluster. These distinct clusters fuse with the two closest clusters. This process continues till reaching a stopping criterion. By contrast, in divisive methods, points start as single clusters that break down into smaller units in each subsequent step till reaching a stopping criterion. Agglomerative methods are more common than divisive methods. Examples of hierarchical methods include BIRCH [141], CHAMELEON [142], CURE [143], and MST [144] clustering.

In one study [145], a way to improve the clustering performance in large datasets using a hierarchical k-means algorithm was tested. This method simplifies a dataset and then restores it to its original state using a gradual succession of good quality initial centroids. It was tested using an advanced metering infrastructure (AMI) dataset. Benchmarking using common clustering methods was used to assess the performance of this method in terms of common adequacy indices, the detection of outliers, and computational time.

Another hierarchal method for OD is the hierarchical maximum likelihood (HML) clustering algorithm created by Sharma et al. [146]. Neither the calculation of triple integrals nor the discovery of the first and second derivatives of the likelihood function are required by the HML algorithm. Furthermore, the HML algorithm can be used with small sample sizes that include greater data dimensionality compared to the number of samples, because during the clustering process, this algorithm takes into account the covariance matrices of the clusters.

According to Aggarwal [288], methods were either sequential ensembles or independent ensembles or model-centered ensembles or data-centered ensembles. Multiple data views were examined studying the issues of surrounding unsupervised clustering in high dimensionality multi-view datasets. In this study, an algorithm was designed that could learn discriminative subspaces unsupervised. This method relied on the theory that reliable clustering will assign the same-class samples to the same cluster in each view. Differently, a spectral-clustering algorithm was proposed in [147] to be used in a multi-view setting with access to multiple views of data. Each view of the data was clustered independently. Dhandapani et al. [148] reduced the complexity of time and space and produced an automated hierarchical density shaving (Auto-HDS) cluster hierarchy on a very large (hundreds of millions of data points) dataset by using a Partitioned HDS.

Despite growing interest on the part of researchers in this area, very few studies have found ways to identify outliers in multi-view data. Those researchers who have experience in this area include Das et al. [149], Janeja and Palanisamy [150], Müller et al. [151], Gao et al. [152,153], Alvarez et al. [154], and Zhao et al. [155]. Das et al. [149] drew on multiple kernel learning to create a heterogeneous anomaly detection method. Janeja and Palanisamy [150] found outliers in spatial datasets by looking across multiple domains. Müller et al. [151] employed subspace-analysis approaches to build an outlier-ranking method for multi-view data.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.